Neutralizing Monoclonal Antibody Use and COVID-19 Infection Outcomes

Key Points Question Are neutralizing monoclonal antibody (nMAb) therapies associated with reduced risk of adverse outcomes of COVID-19 in subpopulations at high risk of poor outcomes and across multiple SARS-CoV-2 variant epochs? Findings In this cohort study that included 167 183 nonhospitalized patients with COVID-19 who were eligible for nMAb treatment, the association between nMAbs and reduced risk of poor outcomes was strongest among immunocompromised and unvaccinated patients. Combining multiple factors into a risk estimation model to stratify patients further highlighted differences in outcomes; by variant, nMAb treatment was associated most strongly with reduced risk of poor outcomes in patients infected with the Delta variant, whereas the associations were weaker in patients infected with the Omicron BA.1 variant. Meaning These findings suggest that risk-targeting strategies are important for optimizing outcomes in patients receiving nMAbs for the treatment of COVID-19.


Genomics Data Processing Pipeline
Genome sequences were processed through the pipeline listed in eTable 2. Each pipeline step was distributed across several executor nodes, with executors being orchestrated by a driver node; outputs from each executor were organized into a central output dataset. Each step in the genomics pipeline is described below in terms of inputs, outputs, processing tools used, and, where relevant, parameter settings.

eTable 2. Genomics Pipeline Steps in Order of Execution With Data Quality Check Outputs Identified
Step  21 and included G4181T, C10029T, A11201G, A11332G, G9053T, C27874T, C19220T, C8986T, and G28916T. Delta samples with at least five of these nine 21J nucleotide variants were assigned to 21J and Delta strains with fewer than five of these nucleotide variants were assigned to 21AI.
The decision to classify Delta strains into the 21J and 21AI groups was based on an independent analysis of clusters within the Delta samples. For this, nucleotide variant data was analyzed with k-means clustering using Jaccard distances, which were calculated with the R package locstra. 22 To avoid spurious clusters due to missing data, the analysis included only nucleotide variants at genomic positions at which no more than 2% of the Delta samples were missing data due to sequencing gaps or ambiguous base calls. To characterize and illustrate clustering across samples from all WHO variants, a principal component analysis (PCA) was performed with Jaccard distances with nucleotide variants at positions with 2% or less missing data.
Annotations from SnpEff were used to identify putative effects of single nucleotide polymorphisms (SNPs) and insertions and deletions (indels) on gene function. The first SnpEff annotation for each nucleotide variant was used in these analyses. All calculations of frequencies of nucleotide variants excluded samples with ambiguous calls or gaps at the corresponding genomic positions.

Missing Data Imputation
For this study, imputation methods implemented were based on codes in the R mice package. 23 Predictive mean matching (PMM) was used as the primary imputation model in the Multiple Imputation by Chained Equations (MICE) algorithm. The PMM approach ensures that imputed values retain the same characteristics as the complete portion of the dataset. The MICE algorithm uses Monte Carlo Markov Chain (MCMC) methods to generate candidate values to replace missing covariates. This MCMC method begins from a random initial value and then generates a dependent parameter sequence by sampling the full conditional probability distributions for each parameter of a model for the missing data. Parameter samples drawn from the chains before the sampler converges are discarded as burn-in. Post-burnin parameter samples are used to generate a predicted value for the missing variable. Donor records (d=5) nearest to the predicted value are randomly sampled. The missing covariate is replaced with the donor value. Once all missing values for the covariate are imputed with this posterior parameter sample, the MICE algorithm moves to the next covariate and starts again. The MICE algorithm terminates after M=20 iterations through each covariate with one or more missing observations.
The influence of imputation-modeling methods was evaluated by comparing imputed results from the PMM method to results using Bayesian stochastic regression. No substantive differences were identified using the Bayesian stochastic regression approach compared to PMM. This indicates that statistical inferences derived from the imputed data were not substantively impacted by the choice of imputation model.
Variables used in PMM matched variables used in the subsequent propensity score and disease risk score models (eFigure 1), with the exception that the imputation models used the outcome variables as predictors in the imputation step. 24 Multiple imputation was carried out for each health system independently. Variables with more than 30% of their values missing were not imputed and were eliminated from consideration in downstream analysis. This missing values threshold excluded the following variables from further consideration: • Pneumococcal polysaccharide vaccine (Pneumovax 23) status • Pneumococcal conjugate vaccine (Prevnar 13) status • Influenza vaccine status and date • Index date vital signs (e.g., oxygen saturation percentage, respiratory rate, blood pressure, heart rate, temperature).
• COVID-19 symptoms and symptom onset date eFigure 1. Variable Groupings Used in Predictive Mean Matching, Propensity Score, and Disease Risk Score Models

Propensity Model
The propensity score (PS) in this study is defined as the probability of receiving nMAbs conditional on a set of observed covariates. The PS was used to compute weights in the marginal structural models (MSM). The variables included in the PS model are shown in eFigure 1Error! Reference source not f ound.. These variables were identified as measurable confounders by clinical experts, including feedback from participating health systems.
Propensity scores were estimated using logistic regression, gradient boosted trees (GBT), and random forest (RF). The estimation of multiple PS models was a priori specified in the statistical analysis plan. The PS models were implemented with Python's scikit-learn API. 25 A hyperparameter grid search was then performed on each model for model selection using hyperparameter options described in eTable 4 for logistic regression, eTable 5 for GBT, and eTable 6 for RF. The parameter "C" denotes the inverse of regularization strength. Parameters not specified in this description or in the tables used the default parameter values. All possible combinations of the hyperparameters in each row were implemented.
All logistic regression models had a maximum of 500 iterations, with a stopping tolerance of 1×10 −4 . Regularization was applied with both the ℓ1 ("Lasso") and ℓ2 ("Ridge") penalties. Every GBT model had a maximum of 500 estimators, with an early stopping tolerance of 1×10 Data was preprocessed as necessary (e.g., one-hot encoding and normalization). Logistic regression, GBT, and RF models were implemented and evaluated within each imputation group. Although the loss function used for PS model training optimizes the classification accuracy, model selection was determined by the optimal covariate balance.

Disease Risk Score
A patient's disease risk score (DRS) is the risk of the composite study outcome (30-day hospitalization or death) if the COVID-19 patient did not receive nMAb treatments. Development of the DRS model was guided by applicable research, clinical subject matter expertise, health system feedback, and best practices in machine learning. In this study, the DRS was used as an effect modifier for evaluating effectiveness.
A variety of modeling and analytic methods were explored while developing the DRS model, including logistic regression and various versions of decision trees. Logistic regression, RF, and GBT were evaluated for use in the DRS model. The highest ranking logistic regression model was used for this study because it was less likely to overtrain the model than the tree based methods, it was wellcalibrated, and the results and coefficients were easily interpreted.
The features used in the DRS model were identical to those used in the PS model. Data was preprocessed as necessary (e.g., one-hot encoding and normalization). The DRS models were implemented with Python's scikit-learn application programming interface (API). 25 A hyperparameter grid search was then performed on the first imputation group for parameter selection using hyperparameter options for logistic regression described in eTable 8. The parameter "C" denotes the inverse of regularization strength. The regularization strength penalizes model complexity. Therefore, the higher the regularization strength (or the lower its inverse, or C), the simpler the model. Parameters not specified in this description or in the tables used the default parameter values. All possible combinations of the hyperparameters in each row were implemented.
All logistic regression models had a maximum of 500 iterations, with a stopping tolerance of 1×10 −4 . Regularization was applied with both the ℓ1 ("Lasso") and ℓ2 ("Ridge") penalties. Elasticnet combines ℓ1 and ℓ2 penalties based on the ℓ1 Ratio (e.g., 0.25 ℓ1 Ratio means ℓ1 is applied at 25% and ℓ2 at 75%). Best parameters based on logistic regression grid search are highlighted in bold. The DRS was developed on 141,942 non-treated patients whose profiles were used to fit the model. The model was then applied to all 167,183 patients in the study regardless of nMAb treatment status. In general, one would expect the model to perform better on data used in training (the non-treated patients). In order to quantify the bias of the model on the non-treated population, an experiment was performed where an "experimental" model was trained on a subset of patients. 80% of the non-treated patients were used for training and the remaining 20% of the non-treated patients were used to test the performance of the model. Results of the experiment based on several metrics are presented in eTable 9 including counts of True Negatives (TN), False Positives (FP), False Negatives (FN), and True Positives (TP) as well as scores for Positive Predicted Values (PPV), Recall, and Matthews Correlation Coefficient (MCC). TN is the number of patients with DRS less than 0.5 who did not experience the study outcome (hospitalization and/or death). FP is the number of patients with DRS of 0.5 or above who did not experience the study outcome. FN the number of patients with DRS less than 0.5 who experienced the study outcome. TP is the number patients with DRS of 0.5 or above who were admitted to the hospital or died. The PPV is the fraction of relevant instances (patients who experienced the study outcome) among the retrieved instances (all patients with DRS > 0.5), while Recall is the fraction of relevant instances (all patients experiencing the study outcome) that were retrieved (having DRS of 0.5 or higher). MCC, which is similar to Pearson correlation coefficient in its interpretation, produces high scores if the DRS obtained good results in all four confusion matrix metrics (TN, FP, FN, TP). The experiment shows that DRS performs similarly on non-treated patients used to train the model and non-treated patients not used to train the model. The FP rate is higher for the treated population compared to the FP rate for the 20% of non-treated patients not used in model training. Since the model does not know if patients were treated, the treated false positives may be indicative of the effects of nMAbs in reducing the risk of bad outcomes. Therefore, treated false positive patients could have been true positive patients if the patients were not treated.

eTable 8. Hyperparameter Options Implemented for the Logistic Regression Grid Search
In this study, the DRS was ultimately used as an effect modifier for evaluating the effectiveness of nMAbs. While the false positive rate and recall rate were within acceptable bounds to use the DRS as an effect modifier, additional model tuning and refinement would be required to use the DRS as a predictive model for direct use in clinical applications. Any model used for clinical applications should also be validated in additional cohorts, such as data from other health systems outside the scope of this study that vary geographically and demographically.

eAppendix. Sensitivity Analyses
The findings of increased 30-day hospitalization rates in treated patients in the Omicron BA.1 variant epoch (January 2022) and with an Omicron BA.1 variant sequence were unexpected and initiated posthoc sensitivity analyses to confirm the results. For simplicity, all sensitivity analyses in this section were performed in the first imputation group.

Covariate Balance Within Variant Epochs
Health systems reported that the nMAb supply shortages in January 2022 required them to only treat patients most at risk for COVID-19 disease progression. Given the treatment decision in January 2022 was influenced by supply chain issues not observed in other study periods, the covariate balance by variant epoch using the stabilized weights was assessed to see if the weighting procedure did not adequately control for confounding in the Omicron BA.

Exact Matching for Treatment Effectiveness
To compare with the nMAb treatment effectiveness results using MSMs, exact matching was performed using the MatchIt package with sandwich standard errors to calculate significance and confidence intervals. Two different sets of variables were used to match treated with non-treated: Set 1 were variables found be strongly associated with treatment and strong effect modifiers. Set 2 included set 1 variables but also included additional variables that were among the top variables associated with treatment in the propensity model.

Use of Immunocompromised Subclasses
Participating health systems reported that the mAb supply shortages in January 2022 required them to only treat patients most at risk for COVID-19 disease progression. Each health system reported specifically targeting immunocompromised patients and tiered the immunocompromised patients into risk groups. The immunocompromised subclass definitions were gathered from all four health systems and reconciled into a single tiered system. The health systems were provided the subclassification definitions and asked to label each patient according to the scheme represented in eAppendix Table 7.  * "Active treatment" includes any patient with a documented drug exposure to one of the defined agents within the 90 days leading up to day 0 (the index date). **"Maintenance therapy": "Treatment that is given to help keep cancer from coming back after it has disappeared following the initial therapy. It may include treatment with drugs, vaccines, or antibodies that kill cancer cells, and it may be given for a long time." https://www.cancer.gov/publications/dictionaries/cancerterms/def/maintenance-therapy . Note that maintenance therapy can include agents that were also used in the initial therapy with the difference being the relative dose or frequency that is given to the patient.
The use of these refined immunocompromised subclasses instead of the binary (yes/no) immunocompromised status variable was tested to confirm the main study effectiveness results. To do this, the immunocompromised subclass variable was used as 1) a variable within the propensity model developed within each variant epoch used to make stabilized weights within the MSMs and 2) a matching variable used in the exact matching procedure.